The focus of our projects have been around joining data from radio stations around the country with census data. We have a list of every radio station in the United States and information about it, as well as the census data from data.world.
require(readr)
require(plyr)
require(dplyr)
# Use Radio Stations.csv
file_path = "Radio Stations.csv"
# Read CSV into dataframe
df <- readr::read_csv(file_path)
# Analyze current columns
names(df)
# Isolate columns we want
measures <- c("Callsign", "Frequency", "City", "State", "Licensee", "Format")
df <- df[measures]
# Remove rows with incomplete information
df <- df[complete.cases(df),]
# Remove incomplete formats
df <- df[!(df$Format=="Format"),]
# Adjust format to match
findFormat <- function(format) {
currentFormat <- unlist(strsplit(tolower(format), " "))
if ("christian" %in% currentFormat || "gospel" %in% currentFormat || "religious" %in% currentFormat || "religion" %in% currentFormat) {
return("Religious")
}
else if ("pop" %in% currentFormat || "contemporary" %in% currentFormat || "mainstream" %in% currentFormat || "top" %in% currentFormat || "hot" %in% currentFormat || "hit" %in% currentFormat) {
return("Pop or Contemporary")
}
else if ("rock" %in% currentFormat || "alternative" %in% currentFormat || "indie" %in% currentFormat) {
return("Rock, Alternative, or Indie")
}
else if ("country" %in% currentFormat || "southern" %in% currentFormat) {
return("Country")
}
else if ("urban" %in% currentFormat || "hip" %in% currentFormat || "rap" %in% currentFormat || "hip-hop" %in% currentFormat) {
return("Hip-hop")
}
else if ("jazz" %in% currentFormat || "blues" %in% currentFormat) {
return("Jazz or Blues")
}
else if ("latin" %in% currentFormat || "mexican" %in% currentFormat || "international" %in% currentFormat) {
return("International")
}
else if ("oldies" %in% currentFormat) {
return("Oldies")
}
else if ("news/talk" %in% currentFormat || "news" %in% currentFormat || "talk" %in% currentFormat || "public" %in% currentFormat) {
return("News and Talk")
}
else {
return("Other")
}
}
# This returns the found format for every format
df$Format <- apply(df["Format"], 1, findFormat)
# Write cleaned up data to file
write.csv(df, row.names=FALSE, file="PostETL-Radio.csv")
Our radio station data is 13,627 radio stations from across the United States. For each station, we have the Callsign, Frequency, City, State, Licensee, and Format. The format has been standardized from each stations self-described format to be one of the following formats: Religious, Pop or Contemporary, Rock or Alternative or Indie, Country, Hip-hop, Jazz or Blues, International, Oldies, News and Talk, or Other.
We’re using stock census data, with some slight modifications. We have each state, it’s total population, male population, female population, 2016 political party affiliation, and a custom Coast column that’s either “West”, “East”, or “None”.
Calculated as the 100,000 * ([Number of Stations] / [Total Population])
Calculated as the 100,000 * ([Number of Stations] / [Male Population])
Calculated as the 100,000 * ([Number of Stations] / [Female Population])
Our non-aggregated measures analysis is a box-and-whiskers plot. The columns are each Format, the rows are an adjusted Stations per Capita, and the data points are each state. This is an interesting way to view the relative frequencies of each format and a simple way to find outliers of states for each format. For example, Washington D.C. has an abnormally large amount of News and Talk radio stations (third data point from the top of News and Talk).
Our aggregated measures analysis in Tableau is a histogram. We bin states by their total population, and count the amount of radio stations that belong to states in each bin. This allows us to see the relative frequency of station formats in smaller and larger states. For example, Religious radio stations (brown) become increasingly less popular as state size increases.
The aggregated measures analysis in Shiny is a bar plot. The total number of radio stations for each state with the color showing the formats of the stations. States populations are shown in the X axis in units of 1 million. The general trend shows that the number of radio stations increases with population.
Our scatter plots compare the amount of female listeners per capita to the amount of male listeners per capita. Here we can easily visualize each states relative male and female population for the amount of stations in the state.
Crosstabs were really the first assignment where we utilized our listener KPIs. We created several crosstabs that allowed us to visualize the amount of listeners per capita for each format for each state. The listeners per capita were divided into three demographics - total, male, and female. This is an interesting way to see which formats are more popular among the different sexes.